A New Hybrid Machine Translation Approach Using Cross-Language Information Retrieval and Only Target Text Corpora
نویسندگان
چکیده
Parallel corpora play a vital role in Statistical Machine Translation. Nonavailability of these corpora is a major barrier for adding new languages pairs. In this paper, we propose a new hybrid approach for English-French machine translation combining a cross-language search engine and a statistical language model trained from a monolingual corpus. The cross-language search engine returns the translation candidates ordered by their relevance and the language model of the target language is used to disambiguate the translation. This approach has been evaluated and compared to Moses. We used 100000 French sentences of the Europarl corpus to train the language model, 1103 English-French sentences of the Arcade-II corpus as the translation reference and the BLEU score. The obtained scores are 21.33% for our approach and 21.45% for Moses. The experimental results also showed that our approach provides better translation performance in terms of grammatical coherence.
منابع مشابه
A Hybrid Approach for Machine Translation Based on Cross- language Information Retrieval
This paper presents a hybrid approach for Machine Translation (MT) based on Cross-language Information Retrieval (CLIR). This approach uses linguistic and statistical processing and does not need parallel corpora as linguistic resources. A first experimental evaluation of this approach has been done on the CESTA corpus and the obtained results seem good and encouraging. The next step is the TAL...
متن کاملA new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملAutomatic extraction of bilingual word pairs using inductive chain learning in various languages
In this paper, we propose a new learning method for extracting bilingual word pairs from parallel corpora in various languages. In cross-language information retrieval, the system must deal with various languages. Therefore, automatic extraction of bilingual word pairs from parallel corpora with various languages is important. However, previous works based on statistical methods are insufficien...
متن کاملUsing Query-Relevant Documents Pairs for Cross-Lingual Information Retrieval
The world wide web is a natural setting for cross-lingual information retrieval. The European Union is a typical example of a multilingual scenario, where multiple users have to deal with information published in at least 20 languages. Given queries in some source language and a target corpus in another language, the typical approximation consists in translating either the query or the target d...
متن کاملMultilingual Document Alignment - A Study with Chinese and Japanese
Natural language processing (NLP) community is increasingly using paralleland comparablecorpora for cross-linguistic research. The knowledge extracted from such corpora helps us in cross-language information retrieval, topic detection and tracking, machine translation, and many other NLP tasks. Parallel or comparable corpora of JapaneseChinese language-pair are rare. We investigate an automatic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011